Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make mapv_into_any() work for ArcArray, resolves #1280 #1283

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

benkay86
Copy link
Contributor

@benkay86 benkay86 commented May 3, 2023

Note this is a breaking change that will require users of mapv_into_any() to tell the compiler, by annotating the return type, whether the returned array should be an Array or an ArcArray.

src/impl_methods.rs Outdated Show resolved Hide resolved
src/impl_methods.rs Outdated Show resolved Hide resolved
@benkay86 benkay86 force-pushed the master branch 2 times, most recently from e8a73e6 to 2ac3c9f Compare May 4, 2023 17:07
src/data_traits.rs Outdated Show resolved Hide resolved
src/impl_methods.rs Outdated Show resolved Hide resolved
tests/array.rs Outdated
Comment on lines 1002 to 1125
#[test]
fn mapv_into_any_diff_types() {
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]];
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]];
assert_eq!(a.mapv_into_any(|a| a.round() as i32 % 2 == 0), a_even);
let b: Array<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0);
assert_eq!(b, a_even);
}

#[test]
fn mapv_into_any_arcarray_same_type() {
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared();
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]];
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.);
assert_eq!(b, a_plus_one);
}

#[test]
fn mapv_into_any_arcarray_diff_types() {
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared();
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]];
let b: ArcArray<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0);
assert_eq!(b, a_even);
}

#[test]
fn mapv_into_any_diff_outer_types() {
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]];
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]];
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.);
assert_eq!(b, a_plus_one);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test here that looks into ArcArray --> Array? That functionality is the original behavior of the function and should remain (or else we'd have to mark as an even greater breaking change). Adding other various tests from the combinatorial set would also be good, e.g., ArcArray<i32> --> Array<f32> or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akern40, it's great to see ndarray getting some love! Looking at this with fresh eyes after a year, I realized the implementation can be further simplified to avoid breaking changes. The implementation is similar to the original version of mapv_into_any(). The trait bounds have been rewritten to be generic over the input memory representation (e.g. OwnedRepr vs OwnedArcRepr), but the output is constrained to have the same memory representation as the input. This addresses the issue in #1280 without requiring any type hinting.

I have added some additional doctests and removed the unit test converting between Array <--> ArcArray.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benkay86 this is awesome! I'm a big fan of the new implementation, I think it's clean. I'm going to close out the other threads in this review; I just had one more request about changing the assert to debug_assert (unless you can think of places where the types and traits are satisfied but the assert fails).

@benkay86 benkay86 force-pushed the master branch 4 times, most recently from 44deee5 to fcc9967 Compare October 24, 2024 20:11
Copy link
Collaborator

@akern40 akern40 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I like that this approach is consistent with other places in the crate where we output an array of the same type, and the trait bounds appear to be simpler, too.

// have the same kind of memory representation (OwnedRepr vs
// OwnedArcRepr), then their memory representations should be the
// same type, e.g. OwnedRepr<A> == OwnedRepr<B>
assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<S as RawDataSubst<B>>::Output>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this to a debug_assert? My impression is that this checks a condition that we think the type system should be ensuring, but we'd like the assert just to be sure; in other words, it's a little mini test case. Unless there are conditions in which the types and trait bounds are satisfied but this assert is false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are 100% correct! The type equality should always be satisfied -- the assertions are just a little sanity check around the unsafe code. We totally can change this to a debug_assert!(). The use of debug_assert!() appears to be consistent with the rest of the ndarray codebase.

I will point out that the compiler nominally optimizes away the assertion if it can prove the types are equal after monomorphization. For example, comment out line 6 of this example -- the assembly output does not change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah! Had no idea, but I suppose that makes sense - I do think it just expands into a conditional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and changes the assert!() to a debug_assert!() as we discussed. Is there anything else you would like to see changed for this PR to be merged?

@akern40
Copy link
Collaborator

akern40 commented Oct 30, 2024

@benkay86 I was doing some thinking about this method not just in the context of concrete types but also in the context of using it in a method. Let's take the following example that's a play on your tests: a function that uses mapv_into_any to compute an array modulo some value, which we'll also take as input. Under the current version of ndarray, that signature would look like

fn is_modulo<S, D>(arr: ArrayBase<S, D>, modulo: i32) -> Array<bool, D>

while under this new version the signature is

fn is_modulo<S, D>(arr: ArrayBase<S, D>, modulo: i32) -> ArrayBase<<S as RawDataSubst<bool>>::Output, D>

Now, the verbosity is one thing - sometimes we need verbose types, we shouldn't always be afraid of them. My bigger concern is that we've actually erased the trait bounds that we originally had on S: we (the humans) know that the output is : Data, but Rust doesn't know that. We're then left with an array that lacks any method that requires more than S: RawData.

Sorry I don't have a solution to this, at least not yet, and I've got some other things I'm trying to pull together around the repo. Happy to keep the conversation up, though - maybe I'm missing something? Or if you want to bounce ideas / solutions around. Really appreciate your persistence on this PR.

@benkay86
Copy link
Contributor Author

I appreciate your help thinking about these trait bounds. The combination of fancy trait bounds + unsafe code can be difficult to reason about!

With regard to the verbosity in the return value of your function signature, there may be ways to modify RawDataSubst or define other helper traits to make writing generic functions that use mapv_into_any() more ergonomic. A related problem you have stumbled upon is that while this PR is backwards-compatible with calls to mapv_into_any() using concrete types, it introduces new trait bounds and is thus a potentially breaking change for generic functions calling mapv_into_any() which would themselves need new trait bounds.

I don't think your second concern about "erasing" trait bounds is valid. (And if we could erase them... would that be a bad thing?) The trait bounds on mapv_into_any() require S: DataMut which constrains S to be an OwnedRepr, OwnedArcRepr, CowRepr, or ViewRepr. An additional trait bound ArrayBase<<S as RawDataSubst<B>>::Output, D>: From<Array<B,D>> requires that we can make the output type from an OwnedRepr, which effectively forbids S: ViewRepr. So, for better or worse, we can only use OwnedRepr, OwnedArcRepr, or CowRepr.

I have added a test case with CowArray.


If you want to go in a totally different direction, I could try to rewrite mapv_into_any() as a trait implemented only by some ArrayBase, akin to ndarray_linalg::eig::Eig. This trait could have an associated type Output that simplifies the return values for genetic methods calling mapv_into_any(). We might be able to write bespoke implementations for OwnedRepr, OwnedArcRepr, and CowRepr.

@akern40
Copy link
Collaborator

akern40 commented Oct 30, 2024

Sorry I should've been a little clearer when I was talking about "erasing" trait bounds. I'm worried about the output, not the input.

So, for better or worse, we can only use OwnedRepr, OwnedArcRepr, or CowRepr

This is my point: we know that, but the compiler doesn't.

(EDITED, for a better example)

Let's say I use mapv_into_any with the new signature. Then the following code:

fn parity<S, D>(arr: ArrayBase<S, D>, modulo: i32) -> bool
where --snip-- // Trait bounds that enable `mapv_into_any` and modulo, but I'm too lazy to write right now
{
    let output = arr.mapv_into_any(|a| a.round() as i32 % 2 == 0);
    // I can only call a limited set of functions on `output`, those that only have the `RawData` bound.
    // So I can't map, can't sum, can't take the first or last, can't view, etc
}

Idk, maybe I'm being ridiculous. mapv_into_any already has a frustrating 'static requirement in several places, so is anyone really calling it in generic code? Maybe not. What do you make of this concern?

@benkay86
Copy link
Contributor Author

Ah, I see what you mean now. We could, without loss of generality for extant types in ndarray, add the following additional trait bounds to mapv_into_any(). Indeed, if we were going to define a helper trait, we would want to include these!

pub fn mapv_into_any<B, F>(self, mut f: F) -> ArrayBase<<S as RawDataSubst<B>>::Output, D>
    where
        // ...
       <S as ndarray::RawDataSubst<i32>>::Output: DataOwned + DataMut,
{
    // ...
}

However, it is not actually necessary to place this trait bound in the definition of mapv_into_any(). It can be added to your parity() function as needed. For a complete example, let us begin by defining an alternative to ArrayBase::round(). (To your previous point, the required trait bounds are rather verbose.)

use num_traits;

fn round<S, A, D>(arr: ArrayBase<S, D>) -> ArrayBase<<S as ndarray::RawDataSubst<i32>>::Output, D>
where
    S: ndarray::DataMut<Elem = A> + ndarray::RawDataSubst<i32> + 'static,
    A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> + 'static,
    D: Dimension,
    ArrayBase<<S as ndarray::RawDataSubst<i32>>::Output, D>: From<Array<i32, D>>,
{
    arr.mapv_into_any(|a| a.round().as_())
}

#[test]
fn test_round() {
    let a: Array<f32, _> = array![1.1, 2.6, 3.0];
    let a_rounded: Array<i32, _> = array![1, 3, 3];
    let b = round(a);
    assert_eq!(b, a_rounded);
}

Now let's try something like your parity() function. We will have to add one additional trait bound to our definition (the last one in the where clause) to convince the compiler that we can call ArrayBase::sum(). But it does work once we have all the right trait bounds!

fn round_sum<S, A, D>(arr: ArrayBase<S, D>) -> i32
where
    S: ndarray::DataMut<Elem = A> + ndarray::RawDataSubst<i32> + 'static,
    A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> + 'static,
    D: Dimension,
    ArrayBase<<S as ndarray::RawDataSubst<i32>>::Output, D>: From<Array<i32, D>>,
    <S as ndarray::RawDataSubst<i32>>::Output: ndarray::Data,
{
    let output = arr.mapv_into_any(|a| a.round().as_());
    output.sum()
}

#[test]
fn test_round_sum() {
    let a: Array<f32, _> = array![1.1, 2.6, 3.0];
    let b = round_sum(a);
    assert_eq!(b, 7);
}

As for the 'static trait bounds, note that this means something different from static lifetimes and is not as limiting as you might think.
https://doc.rust-lang.org/rust-by-example/scope/lifetime/static_lifetime.html#trait-bound

@akern40
Copy link
Collaborator

akern40 commented Oct 30, 2024

It can be added to your parity() function as needed

image

Might be time for me to sleep! Really excellent analysis here. I'm in favor of adding that bound to the function, as that would provide clearer error messages to users who may otherwise be confused as to why they can't call additional functions.

I need to go to bed, but my last consideration now is the verbosity; let's put a bit of thought into whether there's an obvious path forward on lowering it. Happy to think about it this weekend a bit, or if you've got the time and inclination you can throw a design out.

(P.S. Noticed you're at WashU! I'd say "Go Bears" or something but god knows we weren't a big sports school when I was there, can't imagine that's changed. Hope the semester is going well!)

@benkay86
Copy link
Contributor Author

benkay86 commented Nov 1, 2024

OK, I've been playing around with the ergonomics. Here's one approach using a helper trait with (stable) generalized associated types to cut down on the trait bounds. The helper trait is something like:

pub trait MappableData: RawData + 'static {
    type Subst<B>: Data<Elem = B> + DataOwned;
    fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D>;
}
impl<A: 'static> MappableData for OwnedRepr<A> {
    type Subst<B> = OwnedRepr<B>;
    fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D>
    {
        arr
    }
}
impl<A: 'static> MappableData for OwnedArcRepr<A> {
    type Subst<B> = OwnedArcRepr<B>;
    fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D>
    {
        arr.into()
    }
}

The implementation of mapv_into_any() then boils down to:

impl<A, S, D> ArrayBase<S, D>
where
    S: DataMut<Elem = A> + MappableData,
    D: Dimension,
    A: Clone + 'static,
{
    pub fn mapv_into_any<B, F>(self, mut f: F) -> ArrayBase<<S as MappableData>::Subst<B>, D>
    where
        F: FnMut(A) -> B,
        B: 'static,
    {
        if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>() {
            let f = |a| {
                let b = f(a);
                unsafe { unlimited_transmute::<B, A>(b) }
            };
            let output = self.mapv_into(f);
            debug_assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<S as MappableData>::Subst<B>>());
            unsafe { unlimited_transmute::<ArrayBase<S, D>, ArrayBase<<S as MappableData>::Subst<B>, D>>(output) }
        } else {
            <S as MappableData>::from_owned_repr::<B,D>(self.mapv(f))
        }
    }
}

Now we can have more ergonomic calls. For concrete types:

let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]];
let a_even = a.mapv_into_any(|a| a.round() as i32 % 2 == 0)

Function with generic element type but concrete data representation:

fn round<A, D>(arr: Array<A, D>) -> Array<i32, D>
where
    A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,
    D: Dimension,
{
   arr.mapv_into_any(|a| a.round().as_())
}
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_rounded = round(a);

Function generic over element and data representation:

fn round<A, S, D>(arr: ArrayBase<S, D>) -> ArrayBase<<S as MappableData>::Subst<i32>, D>
where
    S: ndarray::DataMut<Elem = A> + MappableData,
    A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,
    D: Dimension,
{
    arr.mapv_into_any(|a| a.round().as_())
}
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_rounded = round(a);

Generic function operating on the (generic) output of mapv_into_any():

fn round_sum<S, A, D>(arr: ArrayBase<S, D>) -> i32
where
    S: ndarray::DataMut<Elem = A> + MappableData,
    A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,
    D: Dimension,
{
    let output = arr.mapv_into_any(|a| a.round().as_());
    output.sum()
}
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_sum = round_sum(a);

Happy to let these ideas percolate a bit more, but let me know what you think. If we go this route then we'll have to decide on what to call the helper trait MappableData and what file/module to declare it in.

(P.S. Noticed you're at WashU! I'd say "Go Bears" or something but god knows we weren't a big sports school when I was there, can't imagine that's changed. Hope the semester is going w

Ha, I am always running into WashU alumni! What program were you in here? I'm a physician/scientist faculty using Rust to power my neuroimaging research.

@akern40
Copy link
Collaborator

akern40 commented Nov 4, 2024

I think this is a great design. I want to suggest a slight twist inspired by yours that I think balances between ergonomics for the user and maintenance / fragility for the library (and suggest an alternative name): DataSusbt. This would be a subtrait of RawDataSubst and would do two things:

  1. Restrict the RawDataSubst associated type to be DataOwned
  2. Define a from_owned function that can map from an array into the associated type of RawDataSubst
pub trait DataSubst<A>: RawDataSubst<A> + DataOwned
where Self::Output: DataOwned
{
    fn from_owned<D: Dimension>(self_: Array<A, D>) -> ArrayBase<Self::Output, D>;
}

I like the name DataSubst because it reveals both the relationship to RawDataSubst, upon which this depends. This would be implemented for OwnedRepr, OwnedArcRepr, and CowRepr.

Unfortunately, this isn't quite as succinct for users as yours, but it has the advantage of not repeating the associated types within the library, which I like. The definition of mapv_into_any becomes

pub fn mapv_into_any<B, F>(self, mut f: F) -> ArrayBase<<S as RawDataSubst<B>>::Output, D>
where
    S: DataMut + DataSubst<B, Output: DataOwned>,
    F: FnMut(A) -> B,
    A: Clone + 'static,
    B: 'static,
{
    if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>() {
        // A and B are the same type.
        // Wrap f in a closure of type FnMut(A) -> A .
        let f = |a| {
            let b = f(a);
            // Safe because A and B are the same type.
            unsafe { unlimited_transmute::<B, A>(b) }
        };
        // Delegate to mapv_into() using the wrapped closure.
        // Convert output to a uniquely owned array of type Array<A, D>.
        let output = self.mapv_into(f).into_owned();
        // Change the return type from Array<A, D> to Array<B, D>.
        // Again, safe because A and B are the same type.
        unsafe { unlimited_transmute::<Array<A, D>, ArrayBase<<S as RawDataSubst<B>>::Output, D>>(output) }
    } else {
        // A and B are not the same type.
        // Fallback to mapv().
        S::from_owned(self.mapv(f))
    }
}

Rust won't directly suggest the clean DataSubst<B, Output: DataOwned> but we can put something about that in the documentation for one or both of DataSubst and mapv_into_any.

Thoughts?

Ha, I am always running into WashU alumni! What program were you in here? I'm a physician/scientist faculty using Rust to power my neuroimaging research.

I did my computer science degrees there! Exciting to see this library being used for research-level neuroscience.

@akern40
Copy link
Collaborator

akern40 commented Nov 4, 2024

Oh and I don't think you need the S: 'static bound? It seems to be compiling for me without it

@benkay86
Copy link
Contributor Author

benkay86 commented Nov 4, 2024

Oh and I don't think you need the S: 'static bound? It seems to be compiling for me without it

It's only needed if you want the extra debug_assert!() to work. I think we could just drop the debug_asesrt!(), especially if there is some way to run the code through miri...

I think this is a great design. I want to suggest a slight twist inspired by yours that I think balances between ergonomics for the user and maintenance / fragility for the library (and suggest an alternative name): DataSusbt. This would be a subtrait of RawDataSubst...

I considered this possibility since the trait is, conceptually, a natural extension of RawDataSubst. Your proposed DataSubst trait is parameterized by its element type A in parallel construction to the RawDataSubst trait. As I think you've noticed, this makes it impossible to do what MappableData lets you do with generalized associated types:

impl <A, S, D> ArrayBase<S, D>
where
    S: DataMut<Elem=A> + MappableData

Instead you must introduce the constraint at the definition of mapv_into_any() where the type B is known:

pub fn mapv_into_any<B, F>(self, mut f: F> -> ArrayBase<<S as RawDataSubst<B>>::Output, D>
where
    S: DataMut<Elem = A> + DataSubst<B, Output: DataOwned>

But now every function calling mapv_into_any() that wants to be generic over S must specify the same S: DataMut<Elem = A> + DataSubst<B, Output: DataOwned> bound. ☹️

Unfortunately, the GAT version of MappableData can't be a subtrait of the non-GAT RawDataSubst<B>. It's unfortunate that GATs were not available when RawDataSubst was written. Rewriting RawDataSubst now would be a very big breaking change. Your DataSubst<B> trait is certainly more elegant, but necessarily more verbose...

This would be implemented for OwnedRepr, OwnedArcRepr, and CowRepr.

I'm wondering if mapv_into_any() needs to support CowRepr? I've never used CowArray, can you point me to any examples of how you would use one?

@akern40
Copy link
Collaborator

akern40 commented Nov 5, 2024

It's only needed if you want the extra debug_assert!() to work. I think we could just drop the debug_asesrt!(), especially if there is some way to run the code through miri...

I recently got Miri support working as part of our CI/CD, so it actually will run Miri the next time you push!

It's unfortunate that GATs were not available when RawDataSubst was written

This is true, but unfortunately I think it's trickier than that: implementing RawDataSubst as a GAT for ViewRepr requires that your generic B live at least as long as the accompanying type A:

impl<'a, A: 'a> RawDataSubst for ViewRepr<&'a A>
{
    type Output<B> = ViewRepr<&'a B>; // Problem occurs here

    unsafe fn data_subst<B>(self) -> Self::Output<B>
    {
        ViewRepr::new()
    }
}

You can't write Output<B: 'a> because you don't have a lifetime 'a to use when implementing RawDataSubst for the other types. And making RawDataSubst generic on 'a causes lifetimes to suddenly boil up your data traits. There might be a way to get that design right, but it's a bigger change than this method, as you pointed out.

Your DataSubst<B> trait is certainly more elegant, but necessarily more verbose...

Ya, like I said, I was trying to find a middle ground. It's still significantly more succinct than the implementation without the additional trait, but it isn't quite as terse as the MappableData trait. It's a trade-off I'm happy with, but it's still a trade-off.

I'm wondering if mapv_into_any() needs to support CowRepr? I've never used CowArray, can you point me to any examples of how you would use one?

No reason it shouldn't, especially if it means a quick impl for the DataSubst trait. For an example, check out as_standard_layout, which returns a CowArray. This avoids a copy for arrays which are already in standard layout.

@benkay86
Copy link
Contributor Author

benkay86 commented Nov 5, 2024

To make MappableData work with CowRepr without having lifetimes "boil up your data traits," it seems like we can use higher-rank trait bounds to get away with something like this. (I confess this is not something I am expert on.) Then we can just use MappableData in our impls without needing any MappableData<'a>.

pub trait MappableData: RawData
{
    type Subst<B>: Data<Elem = B> + DataOwned where for<'b> B: 'b;
    fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D> where for<'b> B: 'b;
}
// omitting other impls for brevity
impl<'a, A> MappableData for CowRepr<'a, A>
{
    type Subst<B> = CowRepr<'a, B> where for<'b> B: 'b;
    fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D> where for<'b> B: 'b
    {
        arr.into()
    }
}

EDIT

It looks like the HRB for<'b> B: 'b implies 'b: static. That's all well and good for mapv_into_any() since we already need the static trait bound for TypeId to work. But it would not work well in general. You would indeed have the lifetime boiling up your data traits like MappableData<'a>. (Forum link.)


To your broader points about balancing maintainability with ergonomics, my counterargument is:

  • Helper traits are not uncommon in the ndarray ecosystem -- consider ndarray-linalg.
  • If there are future GAT-powered versions of RawDataSubst and your proposed DataSubst trait, then MappableData could be made into a subtrait of DataSubst without breaking anything.
  • Signaling to users that MappableData is a helper for mapv_into_any() rather than something more general would make it easier to modify in the future.

The final choice is yours. If you still prefer the DataSubst<A> route then I will do another push to modify the PR accordingly.

@akern40
Copy link
Collaborator

akern40 commented Nov 9, 2024

Thanks for giving me time to think about this! After a bit of experimenting this morning, I like your approach. I'd like to propose a few changes:

  1. Let's rename MappableData to DataMappable, to make it conform with DataOwned, DataMut, etc.
  2. Let's implement DataMappable for all of the storage types. Array, ArcArray, and CowArray will map to themselves, and the view types (both raw and not) will map to Array. This change is necessary because otherwise maps_into_any breaks even more by ceasing to be callable on views.
  3. I'd suggest the following language for the docs of DataMappable (which should live in data_traits.rs), but feel free to alter as you see fit:
/// An array representation trait for mapping to an owned array with a different element type.
///
/// Functions such as [`mapv_into_any`](ArrayBase::mapv_into_any) that alter an array's
/// underlying element type often want to preserve the storage type (e.g., `ArcArray`)
/// of the original array. However, because Rust considers `OwnedRepr<A>` and `OwnedRepr<B>`
/// to be completely different types, a trait must be used to indicate what the mapping is.
///
/// This trait will map owning storage types to the element-swapped version of themselves;
/// view types are mapped to `OwnedRepr`. The following table summarizes the mappings:
///
/// | Original Storage Type   | Corresponding Array Type | Mapped Storage Type | Output of `from_owned` |
/// | ----------------------- | ------------------------ | ------------------- | ---------------------- |
/// | `OwnedRepr<A>`          | `Array<A, D>`            | `OwnedRepr<B>`      | `Array<B, D>`          |
/// | `OwnedArcRepr<A>`       | `ArcArray<A, D>`         | `OwnedArcRepr<B>`   | `ArcArray<B, D>`       |
/// | `CowRepr<'a, A>`        | `CowArray<'a, A, D>`     | `CowRepr<'a, B>`    | `CowArray<'a, B, D>`   |
/// | `ViewRepr<&'a A>`       | `ArrayView<'a, A, D>`    | `OwnedRepr<B>`      | `Array<B, D>`          |
/// | `RawViewRepr<*const A>` | `RawArrayView<A, D>`     | `OwnedRepr<B>`      | `Array<B, D>`          |
pub trait DataMappable: RawData
{
    /// The element-swapped, owning storage representation
    type Subst<B>: Data<Elem = B> + DataOwned;

    /// Cheaply convert an owned [`Array<B, D>`] to a different owning array type, as dictated by `Subst`.
    ///
    /// The owning arrays implement `From`, which is the preferred method for changing the underlying storage.
    /// This method (and trait) should be reserved for dealing with changing the element type.
    fn from_owned<B, D: Dimension>(self_: Array<B, D>) -> ArrayBase<Self::Subst<B>, D>;
}

@akern40
Copy link
Collaborator

akern40 commented Nov 9, 2024

If you can put DataMappable in data_traits.rs, implement it as described in the table (where I've left of the Mut versions of the types for brevity, but they should get implementations too), and run format, I would be thrilled to merge this and include it in the next release of ndarray.

@benkay86
Copy link
Contributor Author

benkay86 commented Dec 3, 2024

Thank you for your patience. I have finally had a chance to work on this. Please let me know what you think.

  • I am not sure how to go about writing a sound implementation of mapv_into_any() for RawViewRepr. If you want, I can still implement DataMappable for RawViewRepr.
  • There is no advantage to implementing mapv_into_any() for ViewRepr (except for API consistency, which is kind of important). If you have an ArrayView then it will be no more performant than mapv(). If you have an ArrayViewMut and you are mapping between two identical types then map_inplace() is what you want. I have noted this in the documentation. Furthermore, supporting this makes the implementation of mapv_into_any more complicated because it permits returning a data representation that is different than the input representation (e.g. ViewRepr -> OwnedRepr). Consider not implementing MappableData for ViewRepr to encourage users to do the right thing.
  • I am not sure how to auto-format my code correctly. Running cargo fmt on the stable channel changes many lines of code that are not mine, so it would appear some other auto-format tool is being used by the project.

@akern40
Copy link
Collaborator

akern40 commented Dec 20, 2024

@benkay86 now it's my turn to thank you - I had to get our BLAS CI/CD situation handled before I could go back to other PRs.

I like all of your suggestions - let's leave the DataMappable implementation to only owned types. The formatter is cargo +nightly fmt.

You've done tremendous work here. If you would like me to cover the last mile I'd be happy to make the necessary changes.

@benkay86
Copy link
Contributor Author

Thank you for kind words @akern40! I am happy to make the requested changes unless there is some tight release deadline that I am getting in the way of. After having (yet more) time to think about this, I have two thoughts I want to run by you.

Number One

It turns out there are two roughly-equally-efficient ways to implement mapv_into_any(). The current version converts the input data representation to OwnedRepr, delegates to mapv_into(), and then converts back to the original data representation from OwnedRepr. This is entirely safe, but might involve a little bit of overhead when repackaging the pointer to the data. Then it transmutes from data type A to data type B, which is safe because A and B are the same type, therefore <S as DataMappable>::Subst<A> must be the same type as <S as DataMappable>::Subst<B>.

The other implementation is to call mapv_into() on the input data representation directly. Then it transmutes from S to <S as DataMappable>::Subst<B>. This is actually unsound because while A and B are the same type, S and <S as DataMappable>::Subst<_> might not be the same data representation. For example, if the input is a ViewRepr then the output is an OwnedRepr and we can't just transmute between the two. To make the transmute sound, we should make DataMappable unsafe to implement and document that the input data representation must always be the same as the output representation.

Do you have a preference between these two implementation strategies?

Number Two

As a final point for thought, I wonder if this PR should be merged at all, or if we should mark the original issue #1280 as won't fix. @RReverser never did give us an example of where he thought returning an ArcArray would make his code more efficient. He wrote:

It would be great if, like mapv_into, it was generic over storage types and allowed to clone ArcArray in-place when no copy is necessary.

In fact, the existing implementation of mapv_into_any() already does this:

let output = self.mapv_into(f).into_owned();

For an ArcArray this ends up calling this implementation of into_owned(), which repackages the memory of the existing, uniquely-owned ArcArray in place. Performing a transmute might save a little bit of overhead here (if the compiler doesn't optimize it out), but there is no wholesale reallocation going on behind the scenes. It's a similar story for CowArray.

Pros of this PR: obviously the current situation is confusing to users, and this PR creates some syntactic/generic sugar to make mapv_into_any() more ergonomic to use with shared data. If we don't merge this then, at the very least, we ought to document that calling mapv_into_any() on ArcArray or CowArray will not unnecessarily clone the data when converting to Array.

Cons: Rust struggles with trait solving and generic return types in particular. Adding a generic return type and the DataMappable trait make the learning curve that much steeper for ndarray and authors of generic functions in particular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants